ExATOlp: extraction of language resources from Portuguese corpora

نویسندگان

  • Lucelene Lopes
  • Renata Vieira
  • Paulo Fernandes
  • Gabriel Couto
چکیده

This paper presents four main features of the ExATOlp software tool. These features provide the following language resources: corpus relevant terms and their morpho-syntactic and frequency features; concordancer (terms contexts); concept tags; and concept hierarchies. The emphasis of the tool relies on the high quality of extracted terms. The provided resources offer a concise representation of non-obvious characteristics of the extracted terms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Descoberta Automática de Relações Não-Taxonômicas a Partir de Corpus em Língua Portuguesa

Ontology construction is a complex process composed by extraction tasks for domain concepts, as well as taxonomic and non-taxonomic relations among concepts. The extraction of non-taxonomic relations is the most neglected task, specially for Portuguese texts. Therefore, this paper presents a proposal for extracting non-taxonomic relations from Portuguese texts represented by a list of concepts ...

متن کامل

Extracting semantic relations from Portuguese corpora using lexical-syntactic patterns

The growing investment on automatic extraction procedures, together with the need for extensive resources, makes semi-automatic construction a new viable and efficient strategy for developing of language resources, combining accuracy, size, coverage and applicability. These assumptions motivated the work depicted in this paper, aiming at the establishment and use of lexical-syntactic patterns f...

متن کامل

Building a Corpus for Named Entity Recognition using Portuguese Wikipedia and DBpedia

Some natural language processing tasks can be learned from example corpora, but having enough examples for the task at hands can be a bottleneck. In this work we address how Wikipedia and DBpedia, two freely available language resources, can be used to support Named Entity Recognition, a fundamental task in Information Extraction and a necessary step of other tasks such as Co-reference Resoluti...

متن کامل

Speech Recognition for Brazilian Portuguese using the Spoltech and OGI-22 Corpora

Speech processing is a data-driven technology that relies on public corpora and associated resources. In contrast to languages such as English, there are few resources for Brazilian Portuguese (BP). This work describes efforts toward decreasing such gap and presents systems for speech recognition in BP using two public corpora: Spoltech and OGI-22. The following resources are made available: AT...

متن کامل

EχATOLP – An Automatic Tool for Term Extraction from Portuguese Language Corpora

This paper describes EχATOLP, a software tool to extract significant terms from an annotated corpus written in portuguese about a specific domain of interest. Being based on linguistic and statistical approaches, this tool extracts terms that are frequent and syntactic relevant to the domain of interest.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012